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Abstract 



We study the structure of inter-industry relationships using networks of money flows between industries in 20 national 
economies. We find these networks vary around a typical structure characterized by a Weibull link weight distribution, 
exponential industry size distribution, and a common community structure. The community structure is hierarchical, 
Q^with the top level of the hierarchy comprising five industry communities: food industries, chemical industries, manufac- 
turing industries, service industries, and extraction industries. 

OO Key words: industrial network, input/output table, money fiows, national accounting, macroeconomics 
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1. Introduction 

Goods in an economy are produced by a network of 
industries, where each industry produces goods by com- 
bining the output of others. The structure of this network 
may provide clues to how economies function and eventu- 
ally shed light on how economies change over time. While 
direct data on physical production fiows between indus- 
tries are unavailable, data on money fiows are. This study 
presents some initial findings about the structure of this 
money fiow network, with a particular emphasis on pat- 
terns that are shared across economies and can serve as 
targets for statistical physics models. 

Money fiows fall into a number of large categories of 
transactions, such as output, consumption, income, and 
investment. Also included are the somewhat smaller (though 
still large) fiows between industries. National accounting 
provides a system for cataloguing these money fiows. Al- 
though national accounting does not use network terminol- 
ogy to describe these fiows, they are naturally expressed in 
these terms, with links representing fiows and nodes rep- 
resenting industries or sectors. Here, we focus on a subset 
of money fiows, those within the business sector, which 
comprises the industries of an economy (Fig. [l]). The re- 
sulting web of industrial trading is therefore not a closed 
network but an open one, with fiows entering and exiting 
from outside. 

Our data comes from input/output (I/O) tables, which 
are part of the national accounting data compiled by na- 
tional statistical agencies. The I/O tables are quite similar 
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to adjacency matrices, with several additional rows and 
columns added to account for boundary fiows, changes to 
stocks, and special categories of goods, as well as separate 
tables to account for import fiows. 

A few studies have already applied network approaches 
to the I/O tables. These studies are roughly divided be- 
tween empirical studies of structure [1-4 and theoretical 
models of dynamics [4-6 . The structure studies suggest 
the existence of clustering among industries. Carvalho [4] 
further finds an asymmetry between in-fiows and out-fiows 
of industries that implies an asymmetry between industries 
as providers of goods and users of goods. While different 
industries tend to require similar numbers of input goods, 
they may provide inputs to either many or few other indus- 
tries - showing that some industries are general purpose 
providers while others are specialists. The models of dy- 
namics have focused on the role that network structure 
may play in economy- wide fiuctuations, by modeling how 
shocks propagate through the web of industries. However, 
despite previous work, many basic properties of these net- 
works remain uninvestigated. 

This paper is a step towards eventually building net- 
work models of economies. We begin by explaining some 
basic principles of national accounting and the measure- 
ment basis for money fiows, since these concepts may not 
be common knowledge among physicists. We then analyze 
industrial networks in terms of topology, fiow size distri- 
bution, industry size distribution, and community struc- 
ture. Our findings suggest that industrial networks have 
rich structure that is susceptible to analysis using complex 
systems approaches. 

The paper is organized as follows. In Section [2] we 
explain the principles of national accounting. In Section 
[3] we describe the data set. In Section |4] we discuss the 



Preprint submitted to Elsevier 



March 2, 2013 



topology, flow size distribution, industry size distribution, 
and community structure of the industrial networks in our 
data set. In Section |5] we discuss our results. 

2. National accounting 

Measurement of money flows involves substantially more 
complications than measurement of other kinds of network 
flows, such as energy, information, or air passenger traf- 
fic, due to the many categories of transactions that are 
separately accounted for and the conventions of national 
accounting. In this section we briefiy describe national 
accounting methods for quantifying these fiows. 

First it is useful to describe the general structure of 
economies; this broader context helps make sense of the 
logic behind industry network data. Economies are com- 
posed of five "institutional sectors" or simply "sectors" for 
short: households, non-financial business, financial busi- 
ness, government, and non-profits. (Fig. [l^) The largest 
money flows are "household consumption" - purchases of 
business sector goods and services by the household sec- 
tor - and "value added" . Value added partly corresponds 
to purchases of household sector labor by the business 
sector, though it contains other components as well, as 
we discuss further on. Household consumption and value 
added collectively are referred to as the "circular flow" by 
economists and constitute the backbone of sectoral money 
flow structure. Note that the circular flow in this sense 
refers only to the monetary aspect of the economy. Bio- 
physical flows, which also have a circular component, are 
maintained by boundary flows from free energy to wastes 
that have no monetary analog. 

Next in size is "intermediate consumption" by the busi- 
ness sector. Intermediate consumption represents pur- 
chases made by industries for goods produced by other 
industries. Whereas household consumption goods are in- 
trinsically desirable - a bottle of cola, say - goods pur- 
chased for intermediate consumption are not, but are in- 
puts required to produce intrinsically desirable goods - 
e.g., carbonated water, syrup, and glass. Intermediate con- 
sumption is recorded in input/output tables, which are a 
part of the accounting system used by national statistical 
agencies to record economic activity. 

Capital purchases are an important exclusion from trans- 
actions classifled as intermediate consumption. Capital 
purchases are purchases for goods that aid the produc- 
tion of other goods and can be used repeatedly over time 
- a bottling machine, say. Goods are classifled as capital 
goods when they can be used repeatedly for more than one 
accounting period, usually one year. Most input/output 
tables only record the industry selling a capital good and 
not the industry buying it. Thus, instead of a full adja- 
cency matrix of capital purchases, input/output data usu- 
ally only records a vector of capital revenues received by 
each industry. 

The transactions underlying money flows in the net- 
work are compiled on an "accrual basis". Under this ac- 



counting system, revenues are recognized when they are 
earned by the transfer of goods or the performance of a 
service. Expenses are recognized when the associated rev- 
enues are earned. To see how this affects the recording of 
money flows, consider a car maker purchasing steel, pro- 
ducing a car, and selling it over some period of time. Un- 
der an accrual basis, the accounts of the car maker - and 
those of the automotive industry in the input/output ta- 
bles - will record sales revenue being received when the 
car is transferred to the consumer, even if the full pur- 
chase price is not paid immediately. At the same time, 
the steel expense will be matched to the car that it helped 
produce, even though such expenses were actually incurred 
earlier. The alternative method of recording transactions 
is "cash-flow basis" accounting, in which transactions are 
recognized when money is paid or received. Accrual basis 
accounting can be thought of as a pseudo-goods tracking 
approach, because it follows the movement of goods rather 
than the movement of money{^ 

Money flows within the full sector-level network are 
not conserved for at least two reasons. First, money may 
disappear from accidental loss or destruction. Second and 
more importantly, money is regularly created and destroyed 
by the flnancial sector. National accounting does enforce 
a virtual conservation law, though, through the use of bal- 
ancing items, which are accounting entries that are calcu- 
lated as the difference between other accounting entries. In 
the I/O tables, the balancing item is value-added, which is 
calculated as the difference between total sales by the busi- 
ness sector and intermediate consumption sales. Value- 
added "measures the value created by production" [7J and 
encompass all forms of personal income - employee com- 
pensation, interest, dividends, and rent, as well as certain 
kinds of taxes and depreciation. 

Finally, though it is not essential to our purpose of 
analyzing industry networks, it is useful to understand how 
GDP is calculated and how it relates to industry networks. 
Exploiting the conservation enforced by the deflnition of 
value-added, one can equate money flows in and out of the 
business sector: 



value added 
+ intermediate consumption 
+imports 
+business taxes 



intermediate consumption 
+household consumption 
+government consumption 
+capital formation 
+exports 
+subsidies 

(1) 



-"^An important special case of the distinction between accrual 
basis and cash-flow basis transactions is depreciation flows. Depreci- 
ation in accounting is the assignment of portions of a flxed expense 
to multiple time periods. Depreciation transactions are recorded as 
though the depreciable asset is consumed over time. The consump- 
tion of a depreciable asset is thus recognized as a transaction many 
times throughout the depreciable lifetime of the asset, even though 
no literal cash flow occurs. 
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Figure 1: (a) Flows of money between sectors in an economy. The dashed box indicates the scope of the I/O tables. 
The 3 "gates" X, O, and V show approximately where the 3 methods for measuring GDP capture flows to compute 
GDP. (See Eq. Q.) Gate X corresponds to the "expenditure approach", gate O to the "output approach", and gate 
V to the "value-added approach". Although the flnance sector is shown apart from the business sector, the I/O tables 
do include inferred service fee flows between flnance and industries of the business sector. Credit flows and deposits, 
however, are outside the scope of the I/O tables. For clarity, several features are not shown: credit flows from non-flnance 
sectors, business taxes, government subsides, transfer payments, government imports, government self-flows, investment 
to foreign countries. Not all interest flows are shown, but can be inferred from credit flows. One of the flve SNA sectors, 
non-proflts, is also not shown for clarity, (b) Flows through a particular industry of the business sector. 



Or, by rearranging terms, 

household consumption 
value added +government consumption 
+business taxes = +capital formation = GDP. 

—subsidies +exports 
—imports 

(2) 

The left hand side represents the "income approach" to 
calculating GDP, in which forms of income are summed. 
The right hand side represents the "expenditure approach" 
to calculating GDP. By using the identity "value added = 
gross output— intermediate consumption", a third approach 
can be derived - the "output approach" - where value 
added is calculated as the difference between all business 
sales and intermediate goods sales. All three approaches 
are used by statistical agencies to validate GDP calcula- 
tions. They also provide equivalent intuitive interpreta- 
tions of GDP as a measure of total income, a measure of 



total expenditures, and as the net output of the business 
sector. 

3. Description of data 

Our data comes from 1997 I/O tables produced by the 
Organization for Economic Cooperation and Development 
(OECD) [8]. The tables describe intermediate consump- 
tion flows in 20 countries (not all OECD members) be- 
tween 41 industries. The I/O data were initially collected 
by national statistical agencies, who followed country-speciflc 
practices for partitioning the business sector into indus- 
tries and measuring flows. The OECD converted country 
data sets into a uniform 41-industry system to make inter- 
national comparisons possible. 

The countries are listed in Table [T] and the industries 
in Table [^2] One industry, "Private households with em- 
ployed persons", was excluded from analysis because it 
was poorly represented in the data, with only 3 out of 
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Table 1: Country data statistics. 



Country 


Year 


Num. industries in data 


Fraction self-flows 


Completeness 


Australia 


1994-95 


38 


0.215 


0.999 


Brazil 


1996 


30 


0.240 


0.998 


Canada 


1997 


34 


0.232 


0.969 


China 


1997 


38 


0.238 


0.943 


Czech Republic 


1995 


40 


0.292 


0.965 


Denmark 


1997 


39 


0.179 


0.957 


Finland 


1995 


35 


0.274 


0.977 


France 


1995 


39 


0.285 


0.776 


Germany 


1995 


36 


0.228 


0.995 


Greece 


1994 


36 


0.168 


0.929 


Hungary 


1998 


36 


0.237 


1.000 


Italy 


1992 


37 


0.247 


0.854 


Japan 


1995 


40 


0.219 


0.818 


Korea 


1995 


39 


0.253 


0.888 


Netherlands 


1995 


38 


0.260 


0.907 


Norway 


1997 


40 


0.204 


0.999 


Poland 


1995 


35 


0.270 


0.998 


Spain 


1995 


39 


0.225 


0.961 


United Kingdom 


1998 


40 


0.286 


0.949 


United States 


1997 


39 


0.238 


0.994 



20 countries (Australia, Japan, and Korea) providing any 
data for it. This industry represents the production activ- 
ity of cooks, butlers, chauffeurs, gardeners, nannies, etc. 
and does not make a significant contribution to fiows in 
any of the 3 countries where data is available. 

Because the I/O tables of individual countries differed 
in both the number of industries and the boundaries be- 
tween them, the translation step between the national sys- 
tem and SNA involved undesired splits and mergers that 
affect the size of fiows and industries. When an industry 
defined by the source country overlapped two or more of 
the industries defined by the OECD, the OECD was forced 
to choose which OECD industry to assign the source in- 
dustry to. As a result, some industries in the OECD data 
represent more than their intended scope of production 
activities, while others represent less. In many instances, 
such mergers caused entire industries to be completely sub- 
sumed under other industries. Table [T] lists the number of 
industries represented after all mergers are taken into ac- 
count. 

4. Network characteristics 

4-1- Notation 

Let A be the adjacency matrix for the money fiows 
between industries. An element Aij denotes the fiow from 
industry j to industry i: 

Aij = fiow from j to i. (3) 

Self links, representing payments of an industry to itself, 
are permitted. 

In addition to fiows between nodes, an industrial net- 
work has in-fiows entering the network from outside, and 



out-fiows exiting the network (Fig. |2j) As explained in 
Section [2j the in-fiows correspond to final consumption, 
capital purchases, and export revenues. The out-fiows cor- 
respond to value added and import expenditures. Let the 
sum over in-fiows to each industry be denoted by the in- 
vector U, and let the sum over out-fiows to each industry 
be denoted by the out-vector V: 

Ui = fiow from outside to i (4) 
Vi = fiow from i to outside (5) 

Flow is conserved at all nodes because of the definition 
of value added, as described in Section [2] At each node z, 
fiow in equals fiow out. Borrowing from ecology, we will 
refer to the fiow into/out of node i as its throughflow^ Ti 

T,^Ui + Y,A^J = J2AJ, + Vi. (6) 

j j 

Summing over all nodes, the throughfiow of the whole busi- 
ness sector is 

i i ij ij i 

This equation is the same as Eq. ([T]), with Aij corre- 
sponding to intermediate consumption and the other terms 
corresponding to Vi or Ui. In economic terms, the 
total throughfiow Q represents gross output (the total of 
all sales by the business sector) plus imports and business 
taxes. 

4.2. Topology 

At the level of aggregation used in our data, industrial 
networks are nearly complete graphs, typically with more 
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Figure 2: Simplified networks structure and notation. 



than 90% of all possible flows having non-zero weight. (Ta- 
ble [l]) 

The high degree of completeness is only a feature of 
highly aggregated I/O tables. Carvalho, studying use ta- 
ble^ with approximately 500 industries, notes that the 
network is only 18% complete at that level of aggregation 

a- 

4-3. Flow weight distribution 

The magnitudes of money flows in different countries 
differ because they are expressed in different currencies 
and their economies vary in size. To make flow weights 
comparable across countries, we normalize them by the 
total throughflow of the country: 



(8) 



where ff^ is the throughflow of country c and a'^j is the 
normalized flow weight of country c. 

The distributions of the normalized flow weights for all 
20 countries are shown in Fig. [3) These distributions cover 
a wide range, with largest and smallest weights separated 
by 5 to 8 orders of magnitude, depending on the country. 
The flow weight distribution is heavy-tailed and shows sig- 
niflcant curvature on log- log axes. It behaves very simi- 
larly for different countries throughout much of its range. 
At lower weights, the various country distributions diverge 
from each other to some extent. 

The weight distributions are similar to both the Weibull, 
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and lognormal distributions, 
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(10) 



These two distributions are frequently difficult to distin- 
guish in empirical data. [10] . A standard method for choos- 
ing the better flt between them is to compare the log- 
likelihoods from maximum likelihood flts of each distri- 
bution, accepting the distribution with the higher log- 
likelihood rrQHT2l. Results are shown in Table B.3 Out of 



■^Use tables are a related data set that shows the expenditure of 
each industry on individual commodities. Use tables are similar to 
I/O tables and are used in their construction. 
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Figure 3: (Color online) Weight distributions of 20 coun- 
tries studied. The dashed line is the best flt Weibull dis- 
tribution to the pooled data from all 20 countries. Inset: 
aij V. aji for Spain. 



20 countries, 11 are better described by a Weibull distribu- 
tion and 9 by a lognormal. We also run a pooled regression 
under the assumption that the data follow approximately 
the same distribution. The pooled regression favors the 
Weibull and is shown as the dotted line in Fig. [3j In ad- 
dition, two other factors favor the Weibull. First, most 
countries do not show clear evidence of non-monotonic 
behavior, which would occur under a lognormal. Finland 
and Hungary are exceptions, showing a small amount non- 
monotonicity. Second, the Weibull tends to overestimate 
the occurrence of the smallest flows, while the lognormal 
tends to underestimate it. It is more likely that the small- 
est flows would be underrepresented in the data due to 
incomplete sampling rather than being overrepresented. 

Because the network is simultaneously directed and 
nearly complete (at this level of aggregation), almost ev- 
ery flow aij in the network has a reciprocating flow aji of 
non-zero weight. The inset of Fig. [3] plots weights against 
reciprocating weights for the United States 10 network, 
with similar results for other countries. The correlation 
between off-diagonal elements is low (with typical correla- 
tion coefficients in the range p = 0.1 to 0.4). In many cases, 
a flow is several orders of magnitude larger or smaller than 
the reciprocating flow, indicating a high degree of asymme- 
try in the network. This is not surprising, since for most 
pairs of transacting industries, one industry is primarily 
the supplier and the other primarily the user. 

The external flows, Ui and l^, between an industry i 
and other sectors of the economy are generally much larger 
than flows between i and other industries, and are compa- 
rable in size to the whole throughflow T^. In Fig. [4j we 
plot the densities of Ui/Ti and Vi/Ti. The flrst quantity 
is the fraction of money in-flows received from flnal con- 
sumption sales, sales of capital goods, and exports. (That 
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Figure 4: (Color online) Density of Ui/Ti and Vi/Ti. 
Country lines (solid) were estimated used kernel density 
smoothing. The dashed line represents the pooled data. 
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Figure 5: (Color online) The throughfiow distributions of 
all 20 countries studied. 



is, all non-intermediate categories of receipts.) The sec- 
ond quantity is the fraction of money out-flows paid to 
value-added and imports (all non-intermediate categories 
of expenditures.) The density of Ui/Ti is spread out across 
the whole interval [0, 1]. This mainly reflects the large vari- 
ation among industries in how directly they service final 
consumption, which is the most important component of 
Ui. In contrast, the density of Vi/Ti is peaked, roughly 
around 0.6. This means that industries are more similar 
with respect to how much they spend on payments to the 
household sector than in how much they receive from it. 
This suggests that while industries differ significantly in 
where they lie on production chains, they have somwhat 
similar labor needs in monetary terms. 

4.4- Node throughfiow distribution 

Node strength generalizes the concept of node degree 
to weighted networks. Since the network is directed, each 
node i has both an in-strength and an out-strength, de- 
fined as the sum of either the in-flows or out-flows incident 
on i. These sums are equal in this network due to flow con- 
servation, so there is only quantity to keep track of, which 
we refer to as the throughfiow Ti of node i. (Eq. (|6|.) As 
was done for link weights, we normalize node throughflows 
to render them comparable between countries: 



(11) 



The quantity ti measures the size of industry i as the frac- 
tion of money flowing through industry i. 

The throughfiow distributions of all 20 countries are 
shown in Fig. |5] The distribution is similar from country 
to country and is approximately exponential. 

Table |B.2| shows the sizes of the 40 industries recog- 
nized in the OECD data. Under the OECD's partitioning 
of industries, the five largest industries are 



• wholesale and retail trade 

• construction 

• real estate activities 

• food, beverages, and tobacco 

• public administration and defense. 

The industries most likely to export are 

• office, accounting, and computing machinery 

• aircraft and spacecraft 

• radio, television, and communication equipment 

• building and repairing of ships and boats 

• motor vehicles, trailers, and semi-trailers. 

Unsurprisingly, the least likely to export are 

• real estate 

• health and social work 

• public administration and defense 

• education 

• construction, 

all industries whose products are not easily traded across 
national borders. The industries receiving the most rev- 
enue from final demand are quite similar: 

• public administration and defense 

• education 

• health and social work 

• construction 

• real estate. 

The industries least likely to receive revenue from final 
demand are 

• iron & steel 

• non-ferrous metals 

• mining and quarrying 



• other non-metallic mineral products 
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• rubber and plastic productsj^ 

4.5. Community structure 

In addition to knowing the statistics of flows and in- 
dustry sizes, we would like to know whether industries 
cluster in any particular way. Such clusters are usually 
referred to as "communities". Many methods exist for 
finding communities in networks [HI [15]; here, we apply 
the method of modularity optimization [161 Hi] • Modular- 
ity maximization involves searching for partitions of the 
network into communities that yield high values of the 
modularity Q over all possible partitions of the network. 
Since our network is directed, we use the directed gener- 
alization of modularity [18], 



Q(ci,...,Cn) = — y] 



m 



^(q,c,), (12) 



Here, q is the community that node i belongs to, m 
J]] • • ttij is the total weight of all edges, and 5^ = ^ 



2 



and Si = ^jCLij. The Dirac delta function 8(k^l) = 1 if 
k = / and otherwise. The modularity gives the total 
weight of edges within communities minus the expected 
weight under a null model. The modularity function scores 
a given partition of the nodes into groups; the task then 
is to search over the many possible partitions of the net- 
work and find the one with the highest score. In prac- 
tice, the number of partitions is usually extremely large, so 
that only a small fraction can be examined directly. This 
has led to many proposals for algorithms that attempt to 
search the space of partitions efficiently for high values of 
Q rather than find the global maximum [151 E] • 

Recent work has shown that the modularity function Q 
admits a large number of high-scoring partitions that are 
not necessarily similar [19]. As a result, different searches 
may arrive at different high-scoring partitions. Determin- 
istic algorithms in particular are problematic because they 
fail to show the many alternative partitions. To address 
this problem, we use a stochastic search algorithm based 
on simulated annealing that returns a different high-scoring 
partition in each run. We repeat the algorithm many 
times, collect an alternative partition from each run, and 
compare them to test their robustness from run to run. 

Specifically, we use the following simple procedure. For 
each country, we run the simulated annealing algorithm 
100 times and extract 100 high-modularity partitions. From 
these partitions we produce a coclassification matrix [20 
with elements pij G [0, 1] equal to the frequency with which 
node i is grouped with node j. If certain nodes or groups 



^See Chenery &; Watanabe [13] for a classification of industries 
based on the fraction of revenues from intermediate sales and the 
fraction of expenditures on intermediate goods. They use the first 
fraction to measure how "final" versus "intermediate" an industry is. 
They use the second to determine whether an industry is "primary" 
or "manufacturing" . Using these two dimensions, they classify in- 
dustries into four rough categories. 



of nodes are frequently grouped together, they will appear 
as blocks of high frequencies in the coclassification matrix; 
if the groups are highly variable, then no particular part 
of the matrix will accumulate a high value. 

For the purpose of community finding, we set self-flows 
an of industries to zero, since these flows may reduce the 
resolution of the method. This happens because includ- 



ing self- flows increases m in Eq. (12), decreasing the null 
model "penalty term" siSj/m. This makes mergers be- 
tween communities that we would like to distinguish more 
favorable, since it is then easier for a link between two 
industries to exceed the null model penalty term. A po- 
tential drawback of excluding self-flows is that if there are 
industries that should be classified as singleton communi- 
ties, our method will not find them, because the associated 
term an — SiSi/m in Eq. (12) can only contribute nega- 
tively to Q. However, in return we gain the benefit of more 
effectively resolving communities between two or more in- 
dustries. This tradeoff is acceptable, since the communi- 
ties we are interested in are mter-industry ones. In fact, 
we find similar results whether self-flows are excluded or 
not, though we only show the results based on excluding 
self- flows. 

Figures |6^-c show the coclassification matrices for Aus- 
tralia, China, and the United States. These figures show 
the level of variation possible within countries from one 
simulated annealing run to another. Although both the 
communities and their stability varied somewhat from coun- 
try to country, different countries nevertheless tended to- 
ward similar groupings corresponding to food industries 
(rows/columns 1-3), chemical industries (4-6), manufac- 
turing industries (7-22), service industries (23-38), and en- 
ergy industries (39-41). Unsurprisingly, industries had a 
higher tendency to transact with other industries of similar 
type. 

To study this common tendency more closely, we con- 
structed the average CCM of all 20 countries. The result 
is another CCM (Fig. [6]i), whose i-jih element now in- 
dicates the frequency with which industries i and j were 
grouped together out of 2000 search runs (100 per coun- 
try). Overall, the five- way grouping above performs well as 
a coarse-grained description of the community structure. 

Going beyond this quick description, we can also study 
the matrix in Fig. [6]i for clues of hierarchical community 
structure [ro", "W. Such structure arises in the CCM be- 
cause industries with ambiguous community membership 
may switch back and forth across a community boundary 
between different runs of the search algorithm. 

For example, the "transport and storage" industry may 
be grouped with service industries in one run, and with 
energy industries in another. The two runs may be differ- 
ent runs for the same country or for two different coun- 
tries, as in the case of Fig. [6]i. An industry that switches 
back and forth between one group and another will ap- 
pear "smeared" across both groups. This indeed occurs 
for "transport and storage" {i = 34). Other industries 
that show this straddling behavior are "hotels and restau- 
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Figure 6: (Color online) Coclassification matrices (CCMs) giving the probability of two industries being grouped in the 



same community. Rows and columns correspond to the 40 economic industries in Table B.2 a, b, and c CCMs for 
Australia, China, and United States, d Average CCM of all 20 countries in Table [l] and dendrogram showing results of 
hierarchical clustering. The vertical axis of the dendrogram measures clustering probabilities pab = 1 — dAB- 



rants" {i = 3, straddles service- food border), "manufac- 
turing NEC, recycling" {i = 7, chemical-manufacturing), 
"office, accounting, and computing machinery" {i = 21, 
manufacturing- service) , "aircraft and spacecraft" (z = 22, 
manufacturing-service), and "research and development" 
{i = 37, manufacturing-service). 

We also observe weak cogrouping at a larger scale, be- 
yond that of single straddler industries. To study these 
grouping patterns, we use hierarchical clustering methods. 
We define the distance between industries to be 



are defined as 



di. 



1 



■Pij 



(13) 



where pij G [0, 1] is the probability with which i cogroups 
with j. To create a hierarchical tree, we use agglomerative 
clustering with the average linkage criterion. We find sim- 
ilar results using other distances and linkage criteria. We 
construct a tree by joining industries one-by-one, start- 
ing with the closest pair of industries and ending with the 
most separated. Distances between clusters of industries 



' " ' ieA,jeB 



ieAjeB 



1 -PAB, 



(14) 

(15) 
(16) 



where pab = ^ieAjeBPij probability that a 

randomly picked pair from clusters A and B are cogrouped. 
This choice of cluster distance is known as the "average 
linkage criterion", and in the present context enables a 
simple interpretation of industry and cluster distances in 
terms of probabilities. In Appendix [Appendix A| we dis- 
cuss properties of the distance function Eq. (13). 



The results of hierarchical clustering are shown in the 
dendrogram at the bottom of Fig. [6]i. The dendrogram 
supports the five-way division into food, chemical, man- 
ufacturing, service, and energy industries. Further inter- 
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pretation has to proceed cautiously, but we observe the 
fohowing: 

• The chemical and manufacturing industries appear 
to form a hierarchy in which the two communities are 
members of a larger "chemo-manufacturing" com- 
munity. 

• Two large sub-communities appear within manufac- 
turing. The industries in the upper left of the man- 
ufacturing block of Fig. [6]i (7-10) are "manufactur- 
ing NEC, recycling", "wood and products of wood 
and cork", "construction", and "other non- metallic 
mineral products", and those in the bottom right 
(11-22) are various metal and machinery industries. 
The manufacturing group thus appears to divide into 
those industries that are structure-producing and those 
that are machinery-producing. 

• The machinery-producing industries further appear 
to contain two subsets. The first, industries 11- 
15, contains basic metal and machinery products. 
The second, industries 19-21, contains "radio, tele- 
vision, and communication equipment", "medical, 
precision, and optical instruments", and "office, ac- 
counting, and computing machinery". These indus- 
tries appear to follow a "precision equipment" pat- 
tern. The four remaining machinery-producing in- 
dustries that are not in either of these subsets {i=16- 
18,22) do not form their own cluster, but are all 
transportation equipment industries (ships and boats, 
motor vehicles, rail vehicles, aircraft and spacecraft). 

• The service community contains two well-connected 
subsets. One subset, "health and social work" and 
"pharmaceuticals" (z=35 & 36), is health-oriented. 
The other subset is less clear cut; its seven members 
are "finance, insurance", "post and telecommunica- 
tions", "other business activities", "computer and 
related activities", "other community, social, and 
personal services", "education", and "pulp, paper, 
paper products, printing, and publishing" . Roughly, 
these sectors follow an "information" theme. 

Although these groupings represent increased tenden- 
cies for intra-group transactions, the hierarchical struc- 
ture given by the dendrogram in Fig. |6] oversimplifies the 
community structure of the network somewhat. Hierar- 
chical clustering forces hierarchical structure even where 
none exists [21 , and the actual clustering behavior may 
be more nuanced. The CCM displays substantial overlap 
between communities that is not apparent from the den- 
drogram in Fig. |6]i. For example, the food and chem- 
ical industries show some tendency to cogroup; in cer- 
tain countries (e.g. Australia) this cogrouping is strong. 
This behavior suggests an alternative hierarchy in which 
the two communities are members of a larger "agrochem- 
ical" community, or equivalently, overlap with the chemo- 
manufacturing community. As a second example, the ser- 



vice community as a whole shows overlap with part of the 
manufacturing community. The particular manufacturing 
industries overlapped tend to be ones further along the 
supply chain - construction, radio, computer, medical, air- 
craft - rather than basic materials industries - metals, fab- 
ricated metal products, other non-metal materials. These 
particular manufacturing industries and the service indus- 
tries may constitute some larger definition of the service 
community that includes its immediate suppliers. 

It is also important to note that the communities at 
this level of aggregation are not mostly isolated clusters, 
but are more like perturbations on top of an otherwise 
strongly connected network. It is possible this behavior 
would change at lower levels of aggregation, with more nar- 
row industry definitions serving to isolate industries from 
irrelevant parts of the economy. 

5. Discussion 

Comparisons of national economies typically focus on 
their differences; it is less often appreciated that economies 
may have substantial amounts of shared structure. Chen- 
ery and Watanabe write, "The structure of production, as 
defined by the input-output model, is the result of the in- 
teraction of a variety of forces, some leading to uniformity 
among countries and others to diversity. To the extent 
that production in various countries is intended to satisfy 
biologically determined human needs, is based on the same 
body of technological knowledge, and is constrained by the 
physical world, we should expect similarity in structure. 
To the extent that there are, among countries, variations 
in the relative scarcity of capital, labor and raw materi- 
als, differences in levels of income and composition of final 
demand, and variation in the scale of production, we may 
expect diversity." [13 While differences are apparent from 
statistics like GDP per capita or the export trade network, 
similarities are not yet well characterized. Such similarities 
can serve as constraints for theoretical and computational 
models of economies. 

Both for the construction of such theories and further 
empirical work, the level of aggregation is important. Un- 
likely other networks where the meaning of a node is clear 
(as a person, city, router, web page, species, etc.), the 
meaning of nodes as industries is necessarily ambiguous 
and subject to arbitrary decisions on the part of the sta- 
tistical agencies collecting economic data. These ambigu- 
ities are not drawbacks of the data per se, but rather re- 
flect fundamental ambiguities in the distinctions between 
products, though they sometimes also reflect the limited 
resources of the statistical agencies. Because of this ambi- 
guity, it is important for future theoretical and empirical 
work to account for the way results should change at dif- 
ferent levels of aggregation. 

A useful way to gauge the aggregation level of an indus- 
try network is to look at the amount of "self-flow" in the 
network. Self-flow represents transactions between flrms 
that are classifled within the same industry. Although 
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Figure 7: (Color online) The industry money flow network of the United States in 1997. Nodes are colored according to 
the communities identified in Fig. [6]i. The size of a node corresponds to its throughfiow (Eq. (|6|.) External fiows U 
and V are omitted for clarity. To reduce picture file size, only fiows larger than j^th of the largest fiow are displayed. 
Remaining fiows represent about 57% of the 40^ = 1600 possible links. The true size of many of these fiows can be best 
seen online by zooming in. No intermediate consumption data was available for the "Public administration and defense" 
industry for the U.S, so it appears as an isolated node. 



these firms may produce different products, they are not 
different enough for them to have fallen into different in- 
dustry bins. In this case, the industry partitioning scheme 
is too coarse-grained to differentiate them. The fraction of 



all intermediate fiows that are self-fiows, J^^ , can serve 

as a measure of the aggregation level of an industry net- 
work data set. For our data, this number varies between 
0.15 to 0.30; that is, some 15 to 30% of inter-industry 
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money flows are really transactions of an industry with 
itself, reflecting the high level of aggregation of our data. 
Individual industries with large self-flows represent good 
candidates for subdivision in future I/O tables. (Table 

El) 

6. Conclusions 

Network methods are useful for studying the relation- 
ships between industries. Here, we have applied them to 
flows of money between industries. These networks are 
weighted, directed, dense, and contain self-links. We have 
characterized the flow weight and industry size distribu- 
tions, identifying functional forms to serve as targets for 
theoretical models. We have examined the community 
structure of industries, flnding groups corresponding to 
food, chemical, manufacturing, service, and energy indus- 
tries, as well as nested sub-groups corresponding to flner 
categories of industries. Applying network methods to in- 
dustrial money flows involves challenges not encountered 
in other network data sets, so to aid other researchers we 
have provided a brief introduction to the concepts and def- 
initions of national accounting, as well as the measurement 
basis and interpretation of money flows. 
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where pab = -\a\\b\ ^ieAjesPij '^^ probability that a 
randomly picked pair from clusters A and B are cogrouped. 
This choice of (Iab is known as the "average linkage cri- 
terion". In the present context it enables a simple inter- 
pretation of both node and cluster distances in terms of 
probabilities. 

The elements pij of the coclassiflcation matrix cannot 
take on arbitrary values; the laws of probability impose 
interdependent constraints on matrix elements. Given the 
cogrouping probabilities Pik and pjk of i and j with some 
third node k, one can show that Pij is bound above and 
below as 

max(0,pi/e ^Pjk - 1) < Pij < 1 - \Pik -Pjk \ . (A.5) 

We can use these bounds to show two useful properties of 
the overlap distance. First, using the lower bound, one 
can show that 

dij < dik + djk; (A.6) 

i.e. the overlap distance obeys triangle inequality. 

Second, using the upper bound one may show that the 
overlap distance is equal to the Chebychev or distance 
applied to columns of the coclassiflcation matrix: 

dij = max \pik - Pjk I • ( A.7) 

k 

The distance is the largest absolute difference between 
elements of columns i and j. Rewriting the upper bound 
as \pik — Pij\ ^ 1 ~ Pij-, we see that max/g \pik — Pjk\ is at 
most 1 — Pij. To see that they are in fact equal, \ei k = i 
and note that \pik — Pjk\ = \Pii ~ Pji\ — 1 ~Pij' Since the 
argument of max/e achieves the largest possible value for 
at least one value of /c, dij = max/^ \pik — Pjk\ = 1 — Pij- 

Appendix B. Industry flow statistics 



Appendix A. Hierarchical clustering with the over- 
lap distance 

Deflne the overlap distance between nodes as 

dij 1 -Pij, (A.l) 

where Pij is the probability that nodes i and j are grouped 
in the same community. Since Pij is the probability that 
i and j are cogrouped, dij is simply the probability that i 
and j are not cogrouped. To determine distances between 
clusters of nodes, let 



' " ' i€A,ieB 

' " ' ieAjeB 
= 1-PAB. (A.4) 
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Table B.3: Comparison of Weibull and lognormal fits to flow weight distribution. 



Country 


Weibull 

A k 


Lognormal 

m s 


Alog-likelihood Best fit 


Australia 


8.81 X 10-^ 0.408 


-10.7 2.96 


67.2 Weibuh 


Brazil 


1.88 X 10-"^ 0.485 


-9.71 2.44 


36.7 Weibuh 


Canada 


1.21 X 10-"^ 0.483 


-10.1 2.34 


2.99 Weibuh 


China 


1.58 X 10-"^ 0.471 


-9.9 2.44 


21.9 Weibuh 


CzechRepublic 


1.14 X 10-"^ 0.471 


-10.2 2.49 


48.4 Weibuh 


Denmark 


6.83 X 10-^ 0.433 


-10.8 2.57 


-15.3 lognormal 


Finland 


1.31 X 10-"^ 0.489 


-9.99 2.12 


-71.3 lognormal 


France 


1.49 X 10-"^ 0.518 


-9.86 2.20 


12.3 Weibuh 


Germany 


1.44 X 10-"^ 0.514 


-9.87 2.12 


-30.1 lognormal 


Greece 


4.14 X 10-^ 0.343 


-11.7 3.45 


35 Weibull 


Hungary 


1.34 X 10-"^ 0.541 


-9.87 1.94 


-63.4 lognormal 


Italy 


1.29 X 10-"^ 0.461 


-10.2 2.56 


37 Weibuh 


Japan 


1.14 X 10-"^ 0.423 


-10.5 3.07 


124 Weibuh 


Korea 


1.07 X 10-"^ 0.442 


-10.4 2.58 


11.8 Weibuh 


Netherlands 


1.05 X 10-"^ 0.494 


-10.2 2.00 


-120 lognormal 


Norway 


7.70 X 10-^ 0.463 


-10.6 2.32 


-53.2 lognormal 


Poland 


1.50 X 10-"^ 0.490 


-9.88 2.23 


-24.2 lognormal 


Spain 


9.93 X 10-^ 0.469 


-10.3 2.33 


-25.2 lognormal 


UnitedKingdom 


8.31 X 10-^ 0.439 


-10.6 2.63 


29.4 Weibuh 


UnitedStates 


9.23 X 10-^ 0.440 


-10.5 2.52 


-26.2 lognormal 


pooled 


1.08 X 10-"^ 0.456 


-10.3 2.54 


545.6 Weibuh 
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